IEEE Transactions on Medical Imaging — Latest Matching Preprints

1

TopBrain Segmentation Challenge for Whole Brain Vessel Anatomy

Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354312 medRxiv

Top 0.1%

8.4%

Show abstract

We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.

2

DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv

Top 0.3%

1.7%

Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

3

High Resolution Multi-depth Quantification of the Retinal Nerve Fiber Layer

Callet, C.; Bertrand, M.; Guzman, K.; Mece, P.; Rossi, E. A.; Grieve, K.

2026-06-01 ophthalmology 10.64898/2026.05.22.26353127 medRxiv

Top 0.5%

0.8%

Show abstract

The retinal nerve fiber layer, composed of axon bundles converging toward the optic nerve, is a key biomarker for diagnosing and monitoring glaucoma and other neurodegenerative diseases. High-resolution en face imaging of individual nerve fiber bundles offers morphological information beyond what conventional optical coherence tomography provides, yet clinical integration remains limited by the lack of automated analysis tools and normative data. Here, we imaged 14 healthy volunteers using time-domain full-field optical coherence tomography and adaptive optics scanning laser ophthalmoscopy, and developed automated pipelines to quantify bundle width, trajectory, tortuosity, and orientation. Bundles were on average 25% wider at shallower retinal depths, width measurements were consistent across imaging modalities, and estimated axon count per bundle decreased significantly with age. Global trajectory analysis revealed systematic deviations of high resolution data from existing mathematical models, particularly in the temporal sector, leading us to propose two refined trajectory models. These normative results provide a foundation for high resolution biomarkers for use in investigations of retinal neurodegeneration.

4

Voxel-wise temporal decomposition of hypoxia-targeted BOLD MRI: method development and proof-of-concept application in glioblastoma

Schmidlechner, T.; Stumpo, V.; Jehli, E.; Zerweck, L.; Bellomo, J.; Gönel, M.; Müller, F.; Sebök, M.; Bink, A.; Kulcsar, Z.; Weller, M.; Regli, L.; Fierstra, J.; van Niftrik, C. H. B.

2026-05-29 radiology and imaging 10.64898/2026.05.27.26354265 medRxiv

Top 0.6%

0.6%

Show abstract

Hypoxia-targeted BOLD MRI is a novel technique, which probes oxygenation physiology in response to a controlled transient hypoxia stimulus. In glioblastoma, the signal response is spatially and temporally heterogeneous. We developed a voxel-wise temporal decomposition framework for hypoxia-targeted BOLD MRI that separates the arrival of responses, transition phases, and steady state during controlled isocapnic hypoxia. Twenty healthy controls underwent 3-T BOLD MRI during a double hypoxic step challenge to establish a normative reference. Three patients with newly diagnosed glioblastoma were included as proof-of-concept cases. For each voxel, we estimated response arrival delay (Delaycorr), delay to plateau, delay to return and an O2-normalized steady-state response (HypoxiaSS). Healthy-control maps were used to construct a voxel-wise normative atlas and, for HypoxiaSS, a global-response-adjusted model for patient deviation mapping. In healthy controls, HypoxiaSS showed lower supratentorial between-subject variabilitythan both whole-stimulus comparators (coefficient of variation: 1.77 versus 2.36 for Hypoxiaavg) and higher voxel-level step-to-step agreement (ICC(2,1): median 0.951 versus 0.792 for Hypoxiaavg). Whole-stimulus averaging exhibited a systematic step-2 signal amplification present in 19 of 20 subjects, which was absent from HypoxiaSS. Asingle global response scalar explained a median 72.5% of voxel-wise between-subject variance in HypoxiaSS. In proof-of-concept patient analyses, G-adjusted HypoxiaSS deviation maps and timing maps identified spatially coherentabnormalities that were partly complementary and extended beyond conventional MRI-defined lesion margins.Temporal decomposition improves the stability and interpretability of hypoxia-targeted BOLD MRI and provides a practical framework for population-referenced physiological mapping and atlas-based deviation mapping in glioblastoma.

5

VOGeo-Gaze: Calibration-Free, Geometry-Aware Deep Learning for Real-Time Gaze Tracking in Clinical Video-Oculography

Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.

2026-05-29 health informatics 10.64898/2026.05.27.26354254 medRxiv

Top 0.7%

0.5%

Show abstract

Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.

6

Automated quantification of cerebral microbleeds for ARIA-H monitoring in Aging and Alzheimer's Disease: A multicenter deep learning validation

Low, Z. X. B.; Rowsthorn, E.; Nazem-Zadeh, M.-R.; Francis, M.; Robb, C.; Howcroft, M.; Whiriskey, R.; Brodtmann, A.; McNeil, J. J.; Law, M.

2026-05-26 radiology and imaging 10.64898/2026.05.19.26353364 medRxiv

Top 1%

0.3%

Show abstract

We trained a self-configuring nnU-Net model for CMB segmentation in a heterogeneous multicenter sample (n=264), including 1.5T and 3T field strengths, SWI and T2*-GRE sequences, and community and clinical cohorts. Model performance was evaluated using 5-fold cross-validation with a focus on object-level detection metrics. Real-world performance was evaluated on scans from an unseen dataset of people with cerebrovascular disease (n=20). The model achieved 0.82 cluster Dice, 0.88 precision, and 0.77 sensitivity on hold-out test data. Notably, the model demonstrated a low false-positive rate, averaging 0.58 false positives (FPs) per scan, an improvement on existing publicly available models. The model achieved high performance in dataset of those with Alzheimer's disease and mild cognitive impairment (0.89 cluster Dice, 0.94 sensitivity), supporting its utility in clinical settings where ARIA-H monitoring is critical. In external validation, the model maintained high robustness with 0.79 sensitivity and 0.95 FPs per scan. By leveraging a heterogenous training strategy and a self-adapting architecture, we demonstrate that deep learning can achieve high-precision CMB detection that is robust to domain shifts. The low FP rate suggests this publicly available pipeline is suitable for automated screening and lesion counting in heterogenous large-scale clinical trials, reducing the burden of manual quantification.

7

Weight-Guided Constraints for Body Model and Lead Selection in Pediatric CIED MRI Safety Simulations

Hameed, S.; Henry, K.; Jiang, F.; Bhusal, B.; Dillenbeck, H.; Gakenheimer-Smith, L.; Webster, G.; Golestani Rad, L.

2026-05-30 radiology and imaging 10.64898/2026.05.26.26354162 medRxiv

Top 1%

0.3%

Show abstract

Pediatric patients with cardiac implantable electronic devices (CIEDs) face limited MRI access due to RF-induced heating, and computational modeling is increasingly used to characterize this risk. The validity of these simulations, however, depends on pairing body models with clinically realistic lead configurations, guidance that is currently lacking. We retrospectively analyzed 302 CIED surgeries in 281 pediatric patients to derive weight-based constraints for simulation design. Weight alone discriminated epicardial from endocardial lead implantation with AUC = 0.90, and adding age and height yielded no improvement, supporting weight as a sufficient single-parameter selection metric. The probabilistic crossover between approaches occurred at 44~kg, substantially higher than the 10 to 15~kg threshold commonly cited in the literature, with a broad transition zone of 21 to 66~kg in which both lead types were routinely used. Lead length was likewise weight-constrained: only 25~cm leads were observed in patients below 6~kg, and leads of 45~cm or longer were uncommon below 50~kg. These findings yield a three-tier framework, with epicardial-only configurations below 21~kg, dual configurations within 21 to 66~kg, and weight-thresholded lead lengths throughout, enabling MRI safety simulations to focus on clinically realizable anatomy and device combinations.

8

Automated Segmentation of Cerebral Arteries on Three-Dimensional Rotational Angiography Using nnUNet v2: Prospective Validation with Quantitative Metrics and Expert Qualitative Assessment

Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.

2026-05-26 radiology and imaging 10.64898/2026.05.20.26353640 medRxiv

Top 1%

0.1%

Show abstract

Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.

9

Assessing Lipid Core Burden Index with Depolarization-Sensitive Optical Frequency Domain Imaging

Jones, G.; Otsuka, K.; Fujisawa, N.; Yamaura, H.; Matsumoto, K.; Okamoto, A.; Yamaguchi, T.; Shimada, T.; Kagawa, S.; Yamazaki, T.; Akasaka, T.; Bouma, B. E.; Villiger, M.; Fukuda, D.

2026-06-01 cardiovascular medicine 10.64898/2026.05.22.26353889 medRxiv

Top 2%

0.1%

Show abstract

Background: Quantitative lipid assessment is central to identifying rupture-prone coronary plaques and represents a therapeutic target for lipid-lowering therapy. Near-infrared spectroscopy (NIRS)-derived lipid core burden index (LCBI) is well validated and widely used for detecting lipid-rich lesions. Optical frequency domain imaging (OFDI) is increasingly adopted for guiding percutaneous coronary intervention (PCI) due to its high-resolution structural imaging capabilities. Depolarization-sensitive OFDI (depOFDI) provides intrinsic lipid contrast and may enable combined structural and compositional plaque characterization within a single OFDI-based platform. Objective: To define an OFDI-derived lipid metric and evaluate its agreement with NIRS-derived LCBI. Methods: Thirty-three patients underwent both polarization-sensitive OFDI and NIRS-intravascular ultrasound imaging during PCI. After exclusion of 4 datasets, 29 co-registered pullbacks were analyzed. A signal-to-noise-corrected depolarization metric was used to identify lipid-rich regions and generate depOFDI chemograms. maxLCBI4mm value and location, as well as total LCBI, were computed and compared with NIRS. Results: depOFDI demonstrated strong agreement with NIRS, showing high correlation for maxLCBI4mm (r^2 = 0.862) and total LCBI (r^2 = 0.867), along with strong spatial concordance for the location of the maxLCBI4mm (r^2 = 0.900). Bland-Altman analysis of LCBI4mm showed minimal bias (10.7) with 95% limits of agreement of [81.4 to 102.8]. Conclusions: depOFDI enables accurate quantification of lipid burden alongside the high-resolution structural information inherently provided by OFDI. Because depolarization metrics can be derived from polarization-diverse detection available in many commercial OFDI systems, this approach provides a practical pathway toward comprehensive plaque characterization within existing PCI workflows, without the need for additional imaging modalities.

10

PIE Toolbox: SSM-PCA Based Software for PET Diagnostic Pattern Analysis

Romanov, M.; Kireev, M.; Didur, M.; Cherednichenko, D.; Korotkov, A.; Valdes-Sosa, P.; Fan, Q.; Wang, Q.

2026-06-01 radiology and imaging 10.64898/2026.05.28.26354341 medRxiv

Top 2%

0.1%

Show abstract

One of the prominent methods in neuroimaging data processing is SSM-PCA, which is based on principal component analysis and allows for the identification of diagnostically significant patterns in the form of statistical maps. We developed software, PIE Toolbox, employs SSM-PCA and classification based on the obtained diagnostic patterns revealed from functional and structural tomographic brain imaging. The program supports the entire analysis pipeline including preprocessing of brain images, diagnostic patterns extraction, building classification models, and prediction based on them. The resulting diagnostic patterns are weighted principal components obtained through SSM-PCA, or their linear combinations. PIE Toolbox allows selection of relevant structural and functional brain patterns, computation of their expression values in regions of interest, classification using support vector machines, and evaluation of model performance via cross-validation. This approach enables the use of patterns as features of intergroup differences for individual diagnosis. The software has been validated on both simulated and ADNI datasets.

11

Normative Speech Modeling for ALS Diagnosis with Application to Other Neurodegenerative Diseases

Shah, M.

2026-05-27 neurology 10.64898/2026.05.25.26354057 medRxiv

Top 2%

0.0%

Show abstract

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease affecting more than 450,000 individuals worldwide and is frequently diagnosed more than 12 months after symptom onset, delaying intervention during a critical early window. Because up to 80% of patients develop dysarthria within two years, subtle changes in speech provide a signal of early bulbar motor neuron degeneration. However, existing speech-based systems rely on supervised classification trained on limited datasets, achieving moderate sensitivity and depending heavily on labeled disease examples, which restrict scalability and early detection. This study introduces SPEAK-NORM, the first-ever normative speech modeling framework for early ALS diagnosis, which learns age- and sex-conditioned motor-speech distributions exclusively from healthy individuals. A conditional variational autoencoder models coordination of hypoglossal, laryngeal, and respiratory motor pathways, and deviation from this healthy manifold is quantified through latent representations and reconstruction error to form a 354-dimensional profile. A calibrated linear Support Vector Machine performs subject-level classification under subject-disjoint validation. On the VOC-ALS database (n = 153), SPEAK-NORM achieves 98% accuracy with balanced sensitivity and specificity, significantly outperforming established clinical acoustic indices and prior systems. The framework maintains strong performance under cross-task generalization and when retrained on healthy controls in independent dementia and Parkinson disease cohorts, demonstrating disease-specific deviation patterns rather than generic neurodegenerative change. Spectral, temporal, and latent separations further support interpretability. By modeling healthy speech instead of memorizing disease examples, SPEAK-NORM enables scalable early neuromotor screening using recording devices, with potential to support earlier diagnosis, differential classification, and monitoring of ALS progression.

12

Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv

Top 3%

0.0%

Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [≤] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

13

An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv

Top 3%

0.0%

Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [≤] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [≤] 40%. After fine-tuning on less than 10% of external data, LVEF [≤] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

14

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv

Top 3%

0.0%

Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

15

Morphological feature remodeling of intracranial arteries in the context of inflammation and HIV-associated cognitive impairment

Hoang, N.; Yang, H.; Uddin, M. N.; Zhong, J.; Faiyaz, A.; Singh, M. V.; Boodoo, Z. D.; Sutton, K. R.; Wang, H. Z.; Sahin, B.; Khan, M. W.; Weber, M. T.; Yuan, C.; Chen, L.; Schifitto, G.

2026-05-27 hiv aids 10.64898/2026.05.19.26353071 medRxiv

Top 3%

0.0%

Show abstract

Background: Despite the success of combination antiretroviral therapy (cART), vascular comorbidities, including cerebrovascular disease, are more prominent in people living with HIV (PLWH) compared to people without HIV (PWOH). However, quantitative assessments of cerebrovascular morphometry and their associations with cognitive outcomes in the context of HIV are still limited. In this study, we explore this missing link. Methods: Magnetic Resonance Angiography (MRA) data, blood markers, and neurocognitive assessments were collected from 73 PWOH subjects (male: 57, female: 16; age: 53 {+/-} 16) and 99 PLWH subjects (male: 66, female: 30, age: 53 {+/-} 11). Vessel morphometric features were quantified using intraCranial Artery Feature Extraction (iCafe) to investigate associations between vessel morphometry, markers of monocytes, endothelial cell activation, and cognitive performance. Results: HIV status predicted a lower total number of branches ({beta} = -0.224, p = 0.001, d = -0.517) and shorter total distal length ({beta} = -0.173, p = 0.021, d = -0.370) with a moderate effect size. Total branch number was found to be negatively associated with plasma levels of monocyte markers (sCD14: r = -0.167, p = 0.033; sCD163: r = -0.157, p = 0.045) and positively correlated with white matter cerebral blood flow (r = 0.550; p [≤] 0.05). HIV status was the strongest predictor of overall cognitive performance in ANCOVA model ({beta} = -0.219, p = 0.006, d = -0.453). Conclusions: Our results suggest that cognitive impairment in PLWH is associated with vessel morphology metrics. Monocyte immune activation may contribute to changes in vessel morphology.

16

Can Large Language Models Diagnose Primary Immunodeficiency from Patient-Described Symptoms?

Reteig, L. C.; Woloshin, S.; Maglione, P. J.; Farmer, J. R.; Ong, M.-S.

2026-05-27 allergy and immunology 10.64898/2026.05.26.26353818 medRxiv

Top 3%

0.0%

Show abstract

Patients with primary immunodeficiency (PID) often face prolonged diagnostic delays and may increasingly turn to large language models (LLMs) to interpret their symptoms during this period. We evaluated whether an LLM could recognize PID from symptom descriptions derived from interviews with 21 PID patients. In a prior study, we showed that GPT-4o identified PID in 96% of cases when prompted with physician-written patient histories (Rider et al., JACI, 2024). Here, when prompted with symptom descriptions in patients' own words, GPT-5 identified PID in only 7 cases (33%), although it more broadly suggested immune system issues in 18 cases (81%). The gap between these findings indicates that LLMs are sensitive to the language and framing of symptom descriptions, performing substantially worse when patients describe their own symptoms in everyday language than when clinicians summarize patient histories in structured medical terms. This study underscores the need to carefully evaluate how LLMs are used in patient-facing applications.

17

Hierarchical organ aging signatures from routine abdominal CT add incremental disease risk stratification beyond blood biomarkers

Deng, Z.; Wang, Y.; Shi, Y.; Wang, L.; Qureshi, T. A.; Gaddam, S.; Javed, S.; Hsu, Y.-C.; De Righi, D. R.; Azab, L.; Diwan, G.; Yang, J. D.; Xie, Y.; Yuan, C.; Vendrami, C. L.; Rodriguez, A.; Specht, K.; Jeon, C. Y.; Chaudhry, H.; Buxbaum, J.; Pisegna, J. R.; Yaghmai, V.; Goessling, W.; Hernandez-Barco, Y. G.; Miller, F. H.; Tirkes, T.; Espinoza, S.; Musi, N.; Dey, D.; Sung, K. H.; Pandol, S. J.; Li, D.

2026-05-27 radiology and imaging 10.64898/2026.05.19.26353206 medRxiv

Top 3%

0.0%

Show abstract

Biological aging is heterogeneous across organ systems, yet whether CT-derived abdominal aging provides prognostic value beyond routine clinical data and whether organ decomposition adds beyond a unified estimate remains untested. We developed and evaluated organ-specific and ensemble biological age models from radiomic features across five abdominal organs in 68,675 CT scans from 32,883 subjects, evaluated on alignment with chronological age of healthy subjects (nested cross validation: MAE=3.68 years, R^2=0.90). In sequential analyses restricted to adults aged 20-60 years which is the stratum of strongest BAG-disease association, ensemble biological age gaps provided incremental prognostic value beyond demographic covariates for all-cause disease and mortality (Delta C-index=0.141, 0.051) and beyond routine blood biomarkers (Delta C-index=0.048), confirming CT-derived aging captures structural information beyond laboratory markers. Organ-specific biological age added incremental prognostic value beyond ensemble selectively for focal diseases: cardiovascular (aorta, Delta C-index=0.091) and hepato-pancreatic (pancreas, Delta C-index=0.096). These findings establish a hierarchical organization of CT-derived biological aging, positioning routine CT as a source that adds prognostic value to existing clinical biomarkers.

18

ERBB4 deficiency promotes atrial myopathy underlying the atrial fibrillation substrate

Yamaguchi, N.; Santucci, J.; Hong, S. J.; Ferrena, A.; Schlamp, F.; Willett, D.; Casdin, C. J.; Park, P. S.; Lin, X.; Xiao, J.; Hall, S.; Barnard, J.; Achter, J.; Kanhert, K.; Lundby, A.; Chung, M. K.; Van Wagoner, D. R.; Park, D. S.

2026-05-27 cardiovascular medicine 10.64898/2026.05.26.26354173 medRxiv

Top 3%

0.0%

Show abstract

Background Atrial fibrillation (AF) is a leading cause of stroke, cardiovascular morbidity, and mortality. Atrial myopathy, characterized by progressive metabolic, electrical, and structural changes, creates the arrhythmogenic substrate that drives AF. Defining the key drivers of atrial myopathic processes is essential for targeted therapies that can mitigate AF progression. Here we explore how reduced ERBB4 expression contributes to the development of left atrial myopathy. Methods We analyzed the Cleveland Clinic Biobank to compare left atrial ERBB4 levels in patients grouped by AF diagnosis. To investigate the impact of reduced ERBB4 levels on atrial tissue substrate, we created mouse models of cardiac-specific Erbb4 deficiency using Mlc2a (myosin light chain 2a)-Cre. Comprehensive physiological assessments were performed. Transcriptomic analyses of the left atrium were performed in an Erbb4 haploinsufficient mouse model and compared with human atrial datasets. Molecular validation of key dysregulated pathways was performed. Results We found that left atrial ERBB4 levels are reduced in patients with AF. Adult cardiomyocyte-specific Erbb4 heterozygous (Erbb4fl/+;Mlc2a-Cre) mice exhibited prolonged P-wave duration in the absence of ventricular dysfunction. Left atrial transcriptomic analysis in Erbb4 haploinsufficient mice showed upregulation of pathways related to fibrosis, apoptosis, and coagulation, and downregulation of pathways related to fatty acid metabolism and mitochondrial function, mirroring changes observed in pressure overload mouse models. A cross-species transcriptomic comparison revealed significant overlap between ERBB4-correlated gene expression and functional pathways in adult human atria and mice with Erbb4 haploinsufficiency. Validating the transcriptomic data, protein and functional assays demonstrated increased fibrosis, apoptosis, and oxidative stress in the mutant left atrial tissue. Conclusion Left atrial ERBB4 levels are reduced in AF patients. A mouse model of Erbb4 deficiency and human atrial transcriptomic analyses highlight a role for ERBB4 in supporting normal atrial metabolism while protecting against inflammation, apoptosis, and fibrosis.

19

Early Life Determinants of Forward Compression Wave Intensity in Adults

Haynes, A.; Mynard, J. P.; van der Veen, M.; Carson, J.; Green, D. J.

2026-05-27 cardiovascular medicine 10.64898/2026.05.26.26354176 medRxiv

Top 3%

0.0%

Show abstract

Intro: Characteristics of the pulse wave transmitted through the carotid arteries are predictive of cognitive decline and cerebrovascular health in humans. This study aimed to identify risk factor trajectories in childhood, adolescence and early adulthood that are associated with forward compression wave intensity (FCWI) in the common carotid artery in adults aged 28 years. Methods: Systolic blood pressure (SBP), body mass index (BMI) and fasting blood glucose (FBG) measured at multiple time-points when participants were aged between 8-20 years were included in a trajectory analysis. At age 28 years, FCWI was measured in 402 (M=206, F=196) participants who underwent a Duplex ultrasound assessment of the common carotid artery. Statistical analysis assessed differences in FCWI between each trajectory group for males and females separately. Results: In males, four trajectory groups were identified for BMI, three for SBP, and two for FBG. In females, three trajectory groups were identified for BMI, SBP, and FG. In males, having higher BMI (P=0.006), SBP (P=0.021) and FBG (P=0.002) from ages 8-20 years was associated with greater FCWI at age 28 years. In females, no associations were found between FCWI at age 28-years and trajectory groups for BMI (P=0.185), SBP (P=0.289) or FBG (P=0.070). Conclusion: Having high BMI, SBP and FBG throughout childhood, adolescence and early adulthood was associated with higher FCWI in the carotid artery at age 28 years in males, but not females. This may have a direct impact on the etiology of cognitive decline and cerebrovascular disease in later life.

20

Dentine markers of pre/early postnatal lead exposure links with brain, cognitive, and behavioral outcomes in adolescents

Marshall, A. T.; Kan, E.; Adise, S.; König, M.; McConnell, R.; Martinez, M.; Midya, V.; Arora, M.; Sowell, E. R.

2026-05-27 pediatrics 10.64898/2026.05.26.26354134 medRxiv

Top 3%

0.0%

Show abstract

Lead is a toxic metal ubiquitous in our environment. While dramatic reductions in lead sources have paralleled equivalent decreases in lead-poisoning rates, chronic lead exposure remains a critical public health concern. Childhood lead exposure (at its lowest levels) is liked to changes in cognitive development but less is known about lead's effects on children's brain structure, especially as a result of in utero exposure. We measured prenatal and early-postnatal lead exposure in shed deciduous teeth of 448 9- and 10-year-old children (from 20 United States cities) and linked those lead levels to childhood brain structure, cognition/behavior, and neighborhood- and family-level socioeconomic characteristics. Here we show negative associations between tooth-lead levels and the thickness of the brain's cortex, particularly in regions linked to language processing. With increasing tooth-lead levels, children of lower-income (versus higher-income) families showed steeper declines in receptive vocabulary. Caregiver-reported behavioral problems exhibited similar associations. With in utero exposure linked to adverse neurodevelopmental outcomes (well before lead exposure and its risks are evaluated by healthcare professionals), prenatal screening of maternal lead levels/exposure, coupled with recommended strategies to reduce its placental transmission, may help reduce lead's effects on future generations.